The human genome contracts again

نویسندگان

  • Dmitri S. Pavlichin
  • Tsachy Weissman
  • Golan Yona
چکیده

UNLABELLED The number of human genomes that have been sequenced completely for different individuals has increased rapidly in recent years. Storing and transferring complete genomes between computers for the purpose of applying various applications and analysis tools will soon become a major hurdle, hindering the analysis phase. Therefore, there is a growing need to compress these data efficiently. Here, we describe a technique to compress human genomes based on entropy coding, using a reference genome and known Single Nucleotide Polymorphisms (SNPs). Furthermore, we explore several intrinsic features of genomes and information in other genomic databases to further improve the compression attained. Using these methods, we compress James Watson's genome to 2.5 megabytes (MB), improving on recent work by 37%. Similar compression is obtained for most genomes available from the 1000 Genomes Project. Our biologically inspired techniques promise even greater gains for genomes of lower organisms and for human genomes as more genomic data become available. AVAILABILITY Code is available at sourceforge.net/projects/genomezip/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

O-14: General Governing Rules of ART Contracts Involving Third Parties

Background: ART contracts involving third parties have been created while clinical reproductive treatments are globally widespread. Iran is pioneer in applying these treatments in middle-east due to shii’at jurisprudence prescribing them. This key role in region, has raised Iranian jurists’ responsibility in developing a legal system regarding administration of ART. The most significant part of...

متن کامل

O-38: Concurrent Whole-Genome Haplotyping and Copy-Number Profiling of Single Cells

Background Methods for haplotyping and DNA copynumber typing of single cells are paramount for studying genomic heterogeneity and enabling genetic diagnosis. Before analyzing the DNA of a single cell by microarray or next-generation sequencing, a whole-genome amplification (WGA) process is required, but it substantially distorts the frequency and composition of the cell’s alleles. As a conseque...

متن کامل

I-44: Concurrent Whole-Genome Haplotyping and Copy-Number Profiling of Single Cells

Background Methods for haplotyping and DNA copynumber typing of single cells are paramount for studying genomic heterogeneity and enabling genetic diagnosis. Before analyzing the DNA of a single cell by microarray or next-generation sequencing, a whole-genome amplification (WGA) process is required, but it substantially distorts the frequency and composition of the cell’s alleles. As a conseque...

متن کامل

MicroRNAs as Immune Regulators of Inflammation in Children with Epilepsy

Epilepsy is a chronic clinical syndrome of brain function which is caused by abnormal discharge of neurons. MicroRNAs (MiRNAs) are small noncoding RNAs which act post transcriptionally to regulate negatively protein levels. They affect neuroinflammatory signaling, glial and neuronal structure and function, neurogenesis, cell death, and other processes linked to epileptogenesis. The aim of this ...

متن کامل

I-38: Chromosome Instability in The Cleavage Stage Embryo

Recently, we demonstrated chromosome instability (CIN) in human cleavage stage embryogenesis following in vitro fertilization (IVF). CIN not necessarily undermines normal human development (i.e. when remaining normal diploid blastomeres develop the embryo proper), however it can spark a spectrum of conditions, including loss of conception, genetic disease and genetic variation development. To s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 29 17  شماره 

صفحات  -

تاریخ انتشار 2013